Conversion Control Flags

Inside Macintosh: Programming With the Text Encoding Conversion Manager /: Chapter 4 - Unicode Converter Reference / Unicode Converter Constants

Conversion Control Flags
Your application uses control flags to determine how the conversion of text from one encoding to another is performed. The conversion functions ConvertFromTextToUnicode (page 129), ConvertFromUnicodeToText (page 139), ConvertFromUnicodeToScriptCodeRun (page 155) and ConvertFromUnicodeToTextRun (page 150) allow you to set control flags specifying the conversion process behavior. You can also specify control flags for the function TruncateForUnicodeToText (page 162).
These functions take a controlFlags parameter whose value you can set using the bitmask constants defined for the flags. A different subset of control flags applies to each of these functions. Using the bitmask constants, you can perform a bitwise OR operation to set the pertinent flags for a particular function's parameters. For example, when you call a function, you might pass the following controlFlags parameter setting:
controlflags=kUnicodeUseFallbacksMask | kUnicodeLooseMappingsMask; 
The following enumerations define constants for the control flag masks:
enum {
   kUnicodeUseFallbacksBit = 0,  
   kUnicodeKeepInfoBit     = 1,  
   kUnicodeDirectionalityBits= 2,
   kUnicodeVerticalFormBit = 4,  
   kUnicodeLooseMappingsBit= 5,  
   kUnicodeStringUnterminatedBit = 6,
   kUnicodeTextRunBit      = 7,  
   kUnicodeKeepSameEncodingBit = 8
   kUnicodeForceASCIIRangeBit = 9,
   kUnicodeNoHalfwidthCharsBit = 10
};
enum {
   kUnicodeUseFallbacksMask= 1L << kUnicodeUseFallbacksBit,
   kUnicodeKeepInfoMask    = 1L << kUnicodeKeepInfoBit,
   kUnicodeDirectionalityMask= 3L << kUnicodeDirectionalityBits,
   kUnicodeVerticalFormMask= 1L << kUnicodeVerticalFormBit,
   kUnicodeLooseMappingsMask= 1L << kUnicodeLooseMappingsBit,
   kUnicodeStringUnterminatedMask = 1L << 
                              kUnicodeStringUnterminatedBit,
   kUnicodeTextRunMask     = 1L << kUnicodeTextRunBit,
   kUnicodeKeepSameEncodingMask = 1L << kUnicodeKeepSameEncodingBit
   kUnicodeForceASCIIRangeMask = 1L << kUnicodeForceASCIIRangeBit
   kUnicodeNoHalfwidthCharsMask = 1L << kUnicodeNoHalfwidthCharsBit
};
The following enumeration defines the possible settings for the directionality bits:
enum {
   kUnicodeDefaultDirection = 0,
   kUnicodeLeftToRight = 1,
   kUnicodeRightToLeft = 2
};
enum {
   kUnicodeDefaultDirectionMask = 
            kUnicodeDefaultDirection << kUnicodeDirectionalityBits,
   kUnicodeLeftToRightMask =
            kUnicodeLeftToRight << kUnicodeDirectionalityBits,
   kUnicodeRightToLeftMask =
            UnicodeRightToLeft << kUnicodeDirectionalityBits
};
Constant descriptions

kUnicodeUseFallbacksMask
A mask for setting the Unicode-use-fallbacks conversion control flag. The Unicode Converter uses fallback mappings when it encounters a source text element for which there is no equivalent destination encoding. Fallback mappings are mappings that do not preserve the meaning or identity of the source character but represent a useful approximation of it. See the function SetFallbackUnicodeToText (page 172).
kUnicodeKeepInfoMask
A mask for setting the keep-information control flag which governs whether the Unicode Converter keeps the current state stored in the Unicode converter object before converting the text string.

If you clear this flag, the converter will initialize the Unicode converter object before converting the text string and assume that subsequent calls do not need any context, such as direction state for the current call.

If you set the flag, the converter uses the current state. This is useful if your application must convert a stream of text in pieces that are not block delimited. You should set this flag for each call in a series of calls on the same text stream.
kUnicodeDefaultDirectionMask
kUnicodeLeftToRightMask
kUnicodeRightToLeftMask
You can specify one of these masks to indicate the global, or base, line direction for the text being converted. This determines which direction the converter should use for resolution of neutral coded characters, such as spaces that occur between sets of coded characters having different directions--for example, between Latin and Arabic characters--rendering ambiguous the direction of the space character.

The value kUnicodeDefaultDirectionMask tells the converter to use the value of the first strong direction character in the string, kUnicodeLeftToRightMask tells the converter that the base paragraph direction is left to right, and kUnicodeRightToLeftMask tells the converter that the base paragraph direction is right to left.
kUnicodeVerticalFormBitMask
A mask for setting the vertical form control flag. The vertical form control flag tells the Unicode Converter how to map text elements for which there are both abstract and vertical presentation forms in the destination encoding.

If set, the converter maps these text elements to their vertical forms, if they are available. For explanation of presentation forms, see Chapter 1, "About Text Encodings and Conversions."
kUnicodeLooseMappingsMask
A mask that determines whether the Unicode Converter should use the loose-mapping portion of a mapping table for character mapping if the strict mapping portion of the table does not include a destination encoding equivalent for the source text element.

If you clear this flag, the converter will use only the strict equivalence portion.

If set this flag and a conversion for the source text element does not exist in the strict equivalence portion of the mapping table, then the converter uses the loose mapping section. For explanation of strict and loose mapping, see Chapter 1, "About Text Encodings and Conversions."
kUnicodeStringUnterminatedMask
A mask for setting the string-unterminated control flag. Determines how the Unicode Converter handles text-element boundaries and direction resolution at the end of an input buffer.

If you clear this bit, the converter treats the end of the buffer as the end of text.

If you set this bit, the converter assumes that the next call you make using the current context will supply another buffer of text that should be treated as a continuation of the current text. For example, if the last character in the input buffer is 'A', ConvertFromUnicodeToText stops conversion at the 'A' and returns kTECIncompleteElementErr, because the next buffer could begin with a combining diacritical mark that should be treated as part of the same text element. If the last character in the input buffer is a control character, ConvertFromUnicodeToText does not return kTECIncompleteElementErr because a control character could not be part of a multiple character text element.

In attempting to analyze the text direction, when the Unicode Converter reaches the end of the current input buffer and the direction of the current text element is still unresolved, if you clear this flag, the converter treats the end of the buffer as a block separator for direction resolution. If you set this flag, it sets the direction as undetermined
kUnicodeTextRunMask
A mask for setting the text-run control flag which determines how the Unicode Converter converts Unicode text to a non-Unicode encoding when more than one possible destination encoding exists.

If you clear this flag, the function ConvertFromUnicodeToTextRun (page 150) or ConvertFromUnicodeToScriptCodeRun (page 155) attempts to convert the Unicode text to the single encoding from the list of encodings in the Unicode converter object that produces the best result, that is, that provides for the greatest amount of source text conversion.

If you set this flag, ConvertFromUnicodeToTextRun or ConvertFromUnicodeToScriptCodeRun, which are the only functions to which it applies, may generate a destination string that combines text in any of the encodings specified by the Unicode converter object.
kUnicodeKeepSameEncodingMask
A mask for setting the keep-same-encoding control flag. Determines how the Unicode Converter treats the conversion of Unicode text following a text element that could not be converted to the first destination encoding when multiple destination encodings exist. This control flag applies only if the kUnicodeTextRunMask control flag is set.

If you set this flag, the function ConvertFromUnicodeToTextRun (page 150) attempts to minimize encoding changes in the conversion of the source text string; that is, once it is forced to make an encoding change, it attempts to use that encoding as the conversion destination for as long as possible.

If you clear this flag, ConvertFromUnicodeToTextRun attempts to keep most of the converted string in one encoding, switching to other encodings only when necessary.
kUnicodeForceASCIIRangeMask
A mask for setting the force ASCII range control flag. If an encoding normally treats 1-byte code points 0x00 through 0x7F as an ISO 646 national variant that is different from ASCII, setting this flag forces 0x00 through 0x7F to be treated as ASCII. For example, Japanese encodings such as Shift-JIS generally treat 0x00 through 0x7F as JIS Roman, with 0x5C as YEN SIGN instead of REVERSE SOLIDUS, but when converting a DOS file path you may want to set this flag so that 0x5C is mapped as REVERSE SOLIDUS.
kUnicodeNoHalfwidthCharsMask
A mask for setting the no halfwidth characters control flag. Japanese encodings such as Shift-JIS and EUC-JP include a set of halfwidth katakana characters derivd from JIS X0201 (0xA1 through 0xDF in Shift-JIS, 0x8EA1 through 0x8EDF in EUC-JP). Setting this flag makes the Unicode Converter treat these encodings as if they did not include the halfwidth katakana and makes the corresponding code points unmappable.